Search results for "Hash function"

showing 10 items of 27 documents

WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes

2018

Hash maps are among the most versatile data structures in computer science because of their compact data layout and expected constant time complexity for insertion and querying. However, associated memory access patterns during the probing phase are highly irregular resulting in strongly memory-bound implementations. Massively parallel accelerators such as CUDA-enabled GPUs may overcome this limitation by virtue of their fast video memory featuring almost one TB/s bandwidth in comparison to main memory modules of state-of-the-art CPUs with less than 100 GB/s. Unfortunately, the size of hash maps supported by existing single-GPU hashing implementations is restricted by the limited amount of …

020203 distributed computingComputer scienceHash function0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural sciencesHash tableElectronic mailMemory management010201 computation theory & mathematicsScalability0202 electrical engineering electronic engineering information engineeringMassively parallelTime complexity2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

researchProduct

SWMapper: Scalable Read Mapper on SunWay TaihuLight

2020

With the rapid development of next-generation sequencing (NGS) technologies, high throughput sequencing platforms continuously produce large amounts of short read DNA data at low cost. Read mapping is a performance-critical task, being one of the first stages required for many different types of NGS analysis pipelines. We present SWMapper — a scalable and efficient read mapper for the Sunway TaihuLight supercomputer. A number of optimization techniques are proposed to achieve high performance on its heterogeneous architecture which are centered around a memory-efficient succinct hash index data structure including seed filtration, duplicate removal, dynamic scheduling, asynchronous data tra…

020203 distributed computingSpeedupXeonComputer scienceHash function020206 networking & telecommunications02 engineering and technologyParallel computingSupercomputerData structureDNA sequencingchemistry.chemical_compoundchemistryScalability0202 electrical engineering electronic engineering information engineeringDNASunway TaihuLight49th International Conference on Parallel Processing - ICPP

researchProduct

Stopping injection attacks with code and structured data

2018

Injection attacks top the lists of the most harmful software vulnerabilities. Injection vulnerabilities are both commonplace and easy to exploit, which makes development of injection protection schemes important. In this article, we show how injection attacks can be practically eliminated through the use of structured data paired with cryptographic verification codes upon transmission. peerReviewed

0301 basic medicineExploitComputer scienceCross-site scriptingCryptographyComputer securitycomputer.software_genreSQL injection03 medical and health sciences0302 clinical medicineSoftwareSQL injectionCode (cryptography)Cryptographic hash functionProof-carrying codeproof-carrying codetietoturvaSQLbusiness.industryXSS030104 developmental biologyinjection030220 oncology & carcinogenesiscryptographic hashbusinesscomputer

researchProduct

FMapper: Scalable read mapper based on succinct hash index on SunWay TaihuLight

2022

Abstract One of the most important application in bioinformatics is read mapping. With the rapidly increasing number of reads produced by next-generation sequencing (NGS) technology, there is a need for fast and efficient high-throughput read mappers. In this paper, we present FMapper – a highly scalable read mapper on the TaihuLight supercomputer optimized for its fourth-generation ShenWei many-core architecture (SW26010). In order to fully exploit the computational power of the SW26010, we employ dynamic scheduling of tasks, asynchronous I/O and data transfers and implement a vectorized version of the banded Myers algorithm tailored to the 256 bit vector registers of the SW26010. Our perf…

256-bitSpeedupXeonComputer Networks and CommunicationsComputer scienceHash functionParallel computingSW26010SupercomputerTheoretical Computer ScienceArtificial IntelligenceHardware and ArchitectureScalabilitySoftwareSunway TaihuLightJournal of Parallel and Distributed Computing

researchProduct

Blockchain based Device identification and authentication in a Smart Grid

2020

The power grid is a critical infrastructure of a country that needs protection and security. According to the report of the International Energy Agency, the electricity demand is constantly increasing the world over. Countries are moving towards green energy and efforts are being made to integrate these green energy into the main grid. Smart Grid will improve the reliability and efficiency of the grid by managing the energy demand. Cyber-attacks and cyber terrorism is also increasingly targeting the electrical grid. Intruders may try to gain access to the grid by exploiting the vulnerability of the grid. IEDs/devices are the endpoints of the network and they are the weakest link in the enti…

AuthenticationIdentification (information)Smart gridComputer scienceCryptographic hash functionComputer securitycomputer.software_genreGridElectrical gridcomputerCritical infrastructureVulnerability (computing)2020 5th International Conference on Smart and Sustainable Technologies (SpliTech)

researchProduct

Gossip

2019

Nowadays, a growing number of servers and workstations feature an increasing number of GPUs. However, slow communication among GPUs can lead to poor application performance. Thus, there is a latent demand for efficient multi-GPU communication primitives on such systems. This paper focuses on the gather, scatter and all-to-all collectives, which are important operations for various algorithms including parallel sorting and distributed hashing. We present two distinct communication strategies (ring-based and flow-oriented) to generate transfer plans for their topology-aware implementation on NVLink-connected multi-GPU systems. We achieve a throughput of up to 526 GB/s for all-to-all and 148 G…

CUDAComputer scienceGossipDistributed computingTransfer (computing)ServerHash functionOverhead (computing)Throughput (business)Proceedings of the 48th International Conference on Parallel Processing

researchProduct

A simple proof of the polylog counting ability of first-order logic

2007

The counting ability of weak formalisms (e.g., determining the number of 1's in a string of length N ) is of interest as a measure of their expressive power, and also resorts to complexity-theoretic motivations: the more we can count the closer we get to real computing power. The question was investigated in several papers in complexity theory and in weak arithmetic around 1985. In each case, the considered formalism (AC 0 -circuits, first-order logic, Δ 0 ) was shown to be able to count up to a polylogarithmic number. An essential part of the proofs is the construction of a 1-1 mapping from a small subset of {0, ..., N - 1} into a small initial segment. In each case the expressibility of …

CombinatoricsDiscrete mathematicsMultidisciplinaryComputer scienceElementary proofHash functionMathematical proofRotation formalisms in three dimensionsPrime number theoremFirst-order logicCoding (social sciences)Initial segmentACM SIGACT News

researchProduct

Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems

2014

The ever-growing amount of data requires highly scalable storage solutions. The most flexible approach is to use storage pools that can be expanded and scaled down by adding or removing storage devices. To make this approach usable, it is necessary to provide a solution to locate data items in such a dynamic environment. This article presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations…

DesignComputer scienceDistributed computingPerformancestorage managementHash function0102 computer and information sciences02 engineering and technologyParallel computingUSable01 natural sciencesSlicingrandomized data distributionAffordable and Clean Energy0202 electrical engineering electronic engineering information engineeringRandomnessExperimentationscalabilityPseudorandom number generatorbusiness.industry020206 networking & telecommunicationsReliabilityData FormatPRNG010201 computation theory & mathematicsHardware and ArchitectureComputer data storageScalabilityTable (database)businessNetworking & Telecommunications

researchProduct

WarpCore: A Library for fast Hash Tables on GPUs

2020

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore -- a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines ent…

FOS: Computer and information sciencesScheme (programming language)Amortized analysisComputer scienceHash functionParallel computingData structureHash tableCUDAComputer Science - Distributed Parallel and Cluster ComputingServerDistributed Parallel and Cluster Computing (cs.DC)Throughput (business)computercomputer.programming_language2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

researchProduct

MetaCache-GPU: Ultra-Fast Metagenomic Classification

2021

The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their $k$-mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growi…

Genomics (q-bio.GN)FOS: Computer and information sciencesSource codeComputer sciencemedia_common.quotation_subjectHash functionContext (language use)MinHashcomputer.software_genreData structureHash tableComputer Science - Distributed Parallel and Cluster ComputingFOS: Biological sciencesPreprocessorQuantitative Biology - GenomicsDistributed Parallel and Cluster Computing (cs.DC)Data miningcomputermedia_commonReference genome50th International Conference on Parallel Processing

researchProduct